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Parsing Key Word Grammars 

William A. Martin 

Key word grammars are defined to be the aama as context free 
graraiars^ except that a production may specify a string of aitoitrary 
ayntools. T^e3e graiarars define languages similar to those uaed In 
the programs CARPS 1 and ELIZA 2 . We shew a irethed of Inplementing 
the LR(Jc) parsing algoritlwi for context free grammars which can be 
modified sligfttly In oreler to parse key word granmars. When this is 
done the algorithm can use many of the techniques used In the ELIZA 
parse. Therefore* the algorithm helps to shew the relation between 
the classical parsers and key word parsers. 



L* The LR(k) Parsing Scheme 

We indicate the basic idea of the LR(k) parsing scheme by giving an 
example. A formal description and discussion o£ the method can be found in 
Knuth , Consider the following context free grammar: 

1 S - Si 

2 S - E + E 

3 E - F * F 
6 B- F 

5 F- X 

6 F - y 

Fig. 1 
The string x * y + x# Lies in the language generated by this granraav. 
We can parse this string with the LR(k) algorithm. Hits algorithm makes 
one pass through the string from Left to right. The parameter k refers 
to the number of characters which the algorithm "looks ahead" at each 
fltep. We will take k * 1. The complete parse of the string is shown in 
Fig. 2. 
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Fig- 2. 
parsing the acring x * y + x* 
This figure shows the successive stages of the push -down- stack used In 
the pace. Each rectangle is named by the symbol S ( at its top left; the 
successive stages of the stack are: 
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Let us see how the information In the successive rectangles, S , and 

corresponding stages of the stack are generated. Each S, is a set of states 

horizontal 
of the form [right side of a production, terminal character) with aAbar placed 

Just before one character on the right side of the production. The terminal 
character is the character which must be the next input character if a re- 
duction of the stack corresponding to the production whose right aide is 
given in the state is to be made. To form Si we ask what productions 
couic possibly lead to the first character of any input string. Since all 
derivations of an acceptable input string must start with production 1, 
i" -* S# we start S Q with the state ["s#, t ] t indicating that wo are looking for 
the string S# followed by no input character and indicating with the placement 
of the * before the S that we have not yec found any of the characters of this 
string. Now from the grammar we see that in order to find the first character 
of this string, S f which is to be followed by a ♦ we must find the string 
E + E followed by a # so we add the state [ E + E, * ] to S n - Similarly, to find 
an E followed by a + we must find either F * F (corresponding to production 3) 
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followed by a + or the string F (corresponding to production /*> followed by 
a +■ This process of adding to S* ■■ ' i the stated which we should be looking 
for as a consequence of the states already In S Q Is called "computing the 
closure of S A M » The complete closure of S n la shown In Fig. 2* 

Now that we have S n we place It on the slack. Wo examine each state 
la S to see If wo have found all of the specified characters in any of them, 
tfe have not, so we add the first Input character, x, to the stack. We then 
compute S by placing in S. every state in S~ which has the " immediately to 
the left of the character* xjvhich was just placed on the stack. Ve place the 
over the x to indicate that the x has been "found". S, thus lias two etates 
(x, *J and [x, +J. Next we compute the closure of S.. It Is already closed. 
We then place 3. on the stack* Now we proceed as we did when we placed S n on 
the stack. We look to see if any states in S. have all of the characters 
found, and both of them do. Since Che next input character Is * we ignore 
the second state, fx, +1, and make the reduction, x -* F, corresponding to 
the first, x is said to be the current "handle". To make this reduction 
we remove x and S, from the stack and replace them with F. Then we form S« 
as the closure o£ those states in S, which have an F preceded by . We 
continue as above until the parse is completed with the generation of S|»« 

Note that there are only a finite number of posaible states and so 
there are only a finite number of distinct S . It is possible to compute 
once and for all each S. which will occur in parsing any string which is 
generated by a given grammar which can be parsed by this algorithm. Thus 
one could set up an array which would give the action which the parser 
should take for each combination of an S. and with an input symbol. 
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The parse is Chen reduced to table look up and the mechanism ia very similar 
to a precedence algorithm parse. However, if there are as many as 200 pro- 
ductions. This array could be very Urge (even if simplifications to remove 
redundant cases are made). 

2. An Implementation of the LR<1) Parsing Scheme 

We now introduce an alternative approach* Tho array approach summarizes 
each state S, in a single number* However, if the next state la very "similar" 
to the last state then an acceptable alternative is to try to repreaent the 
state by many numbers, only a few of which will change with each change of 
state* The push-down- lis t Is thun used to save only those numbers which change 
In Implementing this approach we assume that the depth of the stack can always 
be specified as an entry of the array which define* the state. As wo will see, 
the state can then be defined by alloting entries in this array for each 
combination of handle and character which can follow it in some parse. A 
number of entries equal to the number of characters in the handle would be 
alloted for each such combination. However, in an attempt to simplify the 
procedure without destroying its usefulness we will not keep this much informa- 
tion. Wc will only keep a list of the characters which cannot follow the 
right side of a given production in *ny sentential form, then each right side 
need only appear once in the array defining the state, Instead of once for 
each character which can follow it in some sentential form. Then we can 
specify the array defining the stata as containing one symbol for each symbol 
in each production of the grammar to be parsed. For the grammar above the 
array will have 17 entries. 
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array entry 1 2 3 4 5 6 7 8 9 10 11 U 13 14 15 16 L7 

: or res | 
jymbol 



corresponding S # <S) E + £ (S) F * F (6) F <B)*** x (F) y (F) 



Fig. 3 
Entries 1 and 2 correspond to symbol* 1 and 2 on the right side of production 
1. Entry 3 corresponds to the symbol on the left side of production 1. Tlie 
remaining entries correspond to the remaining productions in the same way. 
The entries corresponding to the left side of a production are filled with a 
pointer to a function which reduces the stack If that production is found aa 
a "handle" (In the sense explained in section 1) not followed by any of the 
specified characters. For example, entry 13 above says that the reduction 
F -* I should only be made if the next input character Is not *. lie call the 
above array the state array. We also set up a second array whose function is 
to describe the entries in the state array. This second array is called the 
property array and its entries are in one-to-one correspondence with the 
entrlee of the state array. If an entry In the state array corresponds to a 
terminal, the corresponding entry in the property stray la zero. If an entry 
in the state array corresponds to a non-terminal on the right side of a pro* 
duction, the entry in the property array is a negative number whose absolute 
value is a pointer to a list of every entry in the state array which corres- 
ponds to the first character on the right side of a production whose left side 

is this non- terminal* (This list is used to form closures efficiently.) If 

a 
an entry in the state array corresponds co/non-termlnal which Ls the left side 

of a production* the corresponding entry in the property array is a pointer 
to a list of those entries In the state array which correspond to an appear- 
ance of this non-torrainsi on the right side of some production. These entries 
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in the property array and the entries in the state array corresponding to the 
left aide of a production are c^ade once and for all and do not change during 
a parse. The current state of the parse Is kept in the entries of the state 
array corresponding to the symbols on the right sides of productions and on 
the push-down-stack. 

Initially the push-down-stack la empty and these state array entries are 
all zero. We then change the state array to ntprosent the state 5, «s follows. 
Referring co Fig. 2 we see that the stack has depth 1 (u ■ 1) after S Q is 
placed on it. During the parse the stack grows deeper and is then reduced, 
but whenever 5. *■ on the top of the stack, the stack has depth 1. Therefore, 
S Q can be interpreted as specifying that we are looking for the character S 
in production 1, B in production 2, F In productions 3 and 6, x In production 
5 and y in production 6, when the stack Is of depth 1* We thus set the state 
array to indicate this by placing a 1 In the entries corresponding to these 
characters. The array then has the form a show in line 3 of rig. 4- 
The stack is empty. 

He now describe the procedure for going from state A ft to state At which 
corresponds to che procedure for going from S Q to S, in Fift. 2. Each Input 
character has associated with it a list of the entries in the state array 
which correspond to that character. The current Input character here, x, thus 
has entry 14 associated with it. Ve then go to entry 14 and see If It con- 
tains a 1, Indicating that x is wanted at the current push-down-stack level, 
which la 1. It Is and so we advance to the next entry, 15. Checking the 
corresponding entry in the property array we see that we should make the 
reduction x - F. Since the handle is only one character long the push-down-stack 
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12 3 4 5 6 7 S 9 10 11 12 13 14 15 16 17 

S # (S) E + B <S) F * F (E) F (E) x (F) y (F) depch, u 

l 100 LOO 1 1 1 1 



-< ntncfc: () 

i«enter.tial Conor x * y + x# 
'•input: x 



10 100 120 



jV 

^jstack; ((0 Q>> 

^sentential form: F * y + x# 
v input: * 

£-! 10 100 123 1 3 3 3 

-(•tack: (|(0 9)1(0 10) CI W)(l 16)|) 

sentential form; F * y + x# 
Vinput: y 

fA.: 10 120 100 1 1 I 2 

-<stack: (|(0 5)[ 

[sentential form; B + x* 
Unput: + 

ft^ 10 123 3003 3 3 3 

-{stack: (|(0 5>|(0 6)U 6)(1 12>(1 14)(1 16)|) 

sentential form: B *f xt 
vinput: x 

ft.. 12 100 100 1 l 1 2 

Jstack: (|(0 2)|) 

j sentential form: Si 
Unput: # 

Fig. 4 
Entries in the state array at the seeps during the parse when a 
new input character is examined. 
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deprli reewins 1* The property list entry colls us thai entries 8, 10, and 12 

coru'nunl tn v. Checking entry 12 we see that it is I so we advance to entry 

13* He vi* we nee that a reduction should be made it* the next input character Is 

not *p iv, vi it '> , so wo abandon this and check entry 10. Entry 10 is 

this 
indicating that thal/jF is not needed. Finally vo check entry 8- Entry 8 is 

1 so we advance to encty 9- We see from the property list that entry 9 cor- 

ra^ponr'r to * : >nnina 1. *, so we place a 2 In entry 9 indicating that we 

need * * at depth 2. W** save the old value of entry 9 on the push-down-stack, 

to be rustore-1 when the stack id reduced back to depth L. This brings us to 

stnte A +v in Fi&. 4j which corresponds to S in Tig* 2. The next input character 

is a * which **end;» us to entry 9, entry 9 contains a 2 and so we advance to 

entry 10* Th.- corresponding entry in the property array tells us that entry 

ii> correspond* to a non-terminal, so we place a 3 in entry 10, saving the 

previous contents on the push down list, we then obtain the list of entries 

needed r« compuce the closure* These are 14 and 16; we go to 14 and 16 and 

plucc a ? in them, again saving the old contents. Since 14 and 16 correspond 

to terminals the closure ia complete. This brings us to state A. in Fig. 4. 

Proceeding in the way we parse the input string as shown in Fig. 4. 

3. Key Word Cramnars 

We define a key word grammar to be the same as a context free grammar 

except that the set of terminal characters is left unspecified and productions 

may contain arbitrary strings of the unspecified terminal characters. Since 

we can't list all of the terminal characters we let a stand for any string oi 

zero or more terminal characters. A key word grammar is a set of productions 

of the form A -* K -..X where each X p is either an intermediate, a 
p 1 n ? i 
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terminal, or the symbol v, and A is an Intermediate* The strings generated 
by the grammar are thus pattern* containing the symbol u. A string lies In 
the language generated by the grammar IE it can be made to natch one of die 
patterns generated by the grammar. 

4. Parsing Key Word Grammars 

Obviously, key word grammars are as general as context free grammars and 
so there will be keyword grammars which cannot be parsed with an algorithm 
less powerful (in some sense) than a non-deterministic push-down automaton. 
At the other end of the scale, since the strings, j f may contain any terminal 
characters the precedence relation ■> holds between every pair of strings of 
terminal characters and so a precedence algorithm may not be sufficient to 
parse keyword grammars. 

We give here a variation of the LR(1) algorithm which seems to have 
enough power to parse interesting keyword grammars. If the algorithm la too 

slow in practice one might investigate a precedence algorithm. If it is not 

i 
powerful enough we could expand it to LR(k) , 

Hie algorithm will not parse all keyword grammars. We therefore should 

define a test which a grammar must pasa which will guarantee that the grammar 

can be parsed by the algorithm. One of the restrictions we make is chat 

no production can have two adjacent a's or an a as the rightmost character of 

the right side of a rule. 

5. A Key Word Grammar P arsi n g A lgo r ithm 

Our algorithm is a modification of the procedure given in Section 2. 
There, we looked for the charactera of a production one by one. Now consider i 
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the production A - a o b. The «J means tliat after we find the "a" we should 
begin looking for a M b" and there can be any number of intervening characters. 
Thus, In the notation In Section 2, If we atart looking for "b" at depth 2 
we want to find "b" at depth 2 or any greater depth. We will indicate this 
by placing a - 2 instead of a 2 in the state Array entry* Now consider the 
grammar 

1. A - a a B 

2. A - a b B 

3. B - C 

This grammar is ambiguous because the string a b c has the two parsings; 



/■ 



A 



a/ b 



b i c J c 

Vet we do not want to throw thia grammar out. He can use it to express the useful 
idea that any string starting with "a" and ending with "b" should be matched 
with production 1 unless it has the specific form abc t Ln which case produc- 
tion 2 should be used* Let us see what this Implies for the modification of 
the procedure in Section 2. If we try co parse the string "a be" we will find 
the "a" at depth I, so wo begin looking for a "c" at any depth ;• 2 and a "b" 
at depth 2* We find a "b" at depth 2 and so we look for the 'V* at depth 3. 
But we are already looking for the "c" at any depth > 2. Therefore* we let 
the specific depth 3 dominate the "2 2" specification. We place the latter on 
a temporary list to be restored if we go on to depth 4, since we don't want 
to reject a atring like abde. Finally, in a conflict between "^ 2" and 
"£ V| we would allow "£ 4" to dominate, pushing "£ 2" onto the main stack. 
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Thls is che idea of our modification to Section 2. In order to 
implement it we indicate In the property array whether or not each character 
on the right aide of some production is preceded by a, but we do not put any 
entries in the state array for the Q's. Then, during parsing, we enter a 
negative number instead of a positive number in the state array if the entry 
corresponds to a character preceded by a or if we are computing the closure 
of such an entry- tfote that on making reductions hand lea corresponding to 
the same production can be of different lengths depending on the length of 
the strings matched by the a's. Therefore we must use the entries in the 
titate array to find the left end of a handle* 

6, Ad Example 

Consider the grammar: 

1 S - 5* 

2 S - Ql + B 

3 E - P*P 

4 S - P 

5 F - x 

6 F- y 

Given the string XX + X# the algorithm finds the parsing 
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Major st^ps In the parae are ahown In Fig. 5- Nate that if the reduction of the 
Initial X to E prevented a correct reduction the string would be rejected. 
There are no graonara for which the algorithm will accept incorrect strings, 
>ut there ace some for which it will reject correct ones. 

! 2 3 6 5 6 7 8 9 10 11 12 13 16 15 16 17 

a (*> 

S # (S) E + E (S) F * F <E) F (E) X ( ') Y <»') «™1 

A . 10 -10 -10 -1 -i -> ' 



stack: () 

Sentential form: XX + Xt 

f Ai : 10 -12 -10 -I -1 -1 

(stack: <|(0 5)1) 

isententtal form: EX + X# 

I -1 2 -i -1 -I "I 

istack: (I<0 5)1) 

[sentential form: EE + X* 

10 -12 3 -10 -I -1 - 1 

tatack: (| (0 5)1(0 6)1) 

sentential form: EE * X* 



I 



i - 12 -10 -10 -I -1 -1 

stack: (|(0 2)| ) 

[sentential Form: S# 



Fig. 5 
Parsing the string XX + X with tho new algorithm. 

7. Cone lufl ion 

Exploring this Idea further would make an Interesting project for someone 
Interested in parsing. For example, we could use all the ideas WeUeni>aum 
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uses such as precedence among productions wlicn one calls for s stack 

reduction and the other doesn't* Also* wc con parse any key word grammar 

4 
by modifying Early's procedure along stellar lines. 
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